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Dynamic Resource Allocation in a Multimodal System 

Field of the Invention 

5 The present invention relates to dynamic resource allocation in a multimodal system 

Background of the Invention 

Multimodal systems are systems which permit a user to provide input in different 
modalities, such as speech or gesture, in parallel, in sequence or as alternatives. The 
10 processing of an input modality is typically split up into a number of tasks carried out by 
corresponding functionality, herein referred to as task entities. The results of processing of 
input via one modality can be combined or fused with the results obtained from the 
processing of other modalities at any stage in the processing chain and is not restricted to 
being combined at the top level by the application to which the inputs are directed. 

15 

The processing demands for processing modalities such as speech can be very high if, for 
example, a large vocabulary is to be catered for and this has restricted the adoption of 
modalities such as speech as input interfaces for mobile devices which typically have very 
limited processing power arid memory available. However, advances in wireless 

20 communication, ad hoc networks and human language technologies are set to enable 
mobile devices to offload processing tasks requiring specialized or powerful processing 
resources to infrastructure-based task entities. Figure 1 of the accompanying drawings 
illustrates a multimodal input system for a mobile device in which the symbolic 
recognition and syntactic analysis tasks involved in processing speech and gesture 

25 modalities are carried out by remote task entities. As can be seen, the outputs of the 
feature-extraction task entities of the mobile device are passed to the remote symbolic- 
recognition task entities over a communication channel; similarly, the outputs of the 
remote syntactic-analysis task entities are passed to the semantic-analysis task entities of 
the mobile device over the same or another communication channel. The setting up of the 

30 ad hoc organization of local and remote task entities is effected by a modality manager of 
the mobile device. 




As a consequence, it may be expected that in the near future mobile device users will be 
able to use a plethora of interaction modalities such as speech, gesture recognition, etc. 
Users will also expect that their appliances will be able to to interact seamlessly, providing 
a multimodal user interface onto services and information regardless of the communication 
5 technology used by the device (for example, technologies such as 3G cellular, 802.11 
wireless LAN, and Bluetooth). 

In a world of disaggregated computing, the bandwidth between input clients (such as, but 
not limited to, mobile devices) and computing resources serving as task entities will 

1 0 dramatically influence where and to what degree multimodal input (with or without fusion) 
can be carried out effectively. At certain points in the communications infrastructure used 
by the input clients, bandwidth is likely to be less than needed. For example, where a 
mobile device has a collection of co-operating input clients that utilise internet-based task 
entities via an 802.1 1 network to process multiple input modalities, the bandwidth of the 

1 5 interconnection between the mobile device and the task entities will be influenced by other 
users in the local vicinity and the environment. A fall in the available bandwidth will 
impact all modalities currently being handled. 

It is an object of the present invention to facilitate multimodal input in systems subject to 
20 resource restrictions. 

Summary of the Invention 

According to one aspect of the present invention, there is provided a method for 
dynamically allocating a resource used by task entities respectively involved in processing 
25 different input modalities, wherein the resource is dynamically allocated between the task 
entities according to one or more of the following: 

actual usage of the different modalities by a user; 
confidence in the results of processing of each of the modalities; 
pragmatic information on mode usage. 
30 Pragmatic information on mode usage provides a measure of how the target application is 
set up to use input from different modes - in other words, whether input from one modality 
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is more important or useful than that from another modality, at least in the current 
application context. 

The resource concerned is, for example, communication bandwidth or processing power. 

5 

The present invention also envisages systems for implementing the foregoing method. 



Brief Description of the Drawings 

10 Embodiments of the invention will now be described, by way of non-limiting example, 
with reference to the accompanying diagrammatic drawings, in which: 
. Figure 1 is a diagram of a mobile device with two input modalities where certain 

processing tasks in respect of those modalities are carried out on remote 

resources; 

15 . Figure 2 is a diagram illustrating the control of the allocation of communication 

bandwidth between task entities associated with different input modalities; 
. Figure 3 is a diagram similar to Figure 1 but showing bandwidth allocation control 

between modalities for first and second communication channels between 

the mobile device and the remote resources; and 
20 . Figure 4 is a diagram similar to Figure 3 but for the case of only a single 

communication channel existing between the mobile device and the remote 

resources. 



25 Best Mode of Carrying Out the Invention 

Figure 2 illustrates an embodiment of the present invention in which task entities have 
been organized by a modality manager to provide viable processing stacks for first and 
second input modalities. The processing stack of each input modality includes a respective 
pair of task entities that are linked via a communication channel common to both 
30 modalities. Bandwidth restrictions on the communication channel linking task entities of 
task-entity pair thus have the potential of affecting processing of both modalities. 
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However, in the Figure 2 arrangement a bandwidth moderator is provided to control the 
relative usage of the communication channel by the task entities of the two modalities. The 
bandwidth moderator receives inputs regarding input mode usage by the user, the modal 
requirements of the dialogue manager and application, and confidence in the recognition 
5 process for each modality. The first of these inputs can be derived from any processing 
stage up the processing stack formed by the task entities of each modality though generally 
the input will be derived at the stage controlled by the bandwidth moderator; the second 
input comes from the application and/or dialogue and/or pragmatic manager; and the third 
input can be an overall confidence measure from the application and/or dialogue and/or 

10 pragmatic manager top-level or a more local confidence measure from a task entity either 
controlled by the bandwidth moderator or a task entity receiving the output from an entity 
controlled by the bandwidth moderator. By way of example of a locally-derived third input, 
the syntactic-analysis task entity may monitor its own performance and if it is not confident 
that the correct sentence is represented in the word or phoneme lattice, then it indicates 

1 5 this to the associated bandwidth moderator with a view to getting increased bandwidth to 
represent sentences. 

Whilst all three inputs are preferably provided to the bandwidth moderator, it is possible 
for the moderator to operate using just any two or any one of the inputs. Additional inputs 
20 may also be provided to the bandwidth moderator. 

The bandwidth moderator uses the inputs it receives to determine the allocation of the 
channel bandwidth between the two modalities in order to seek to optimize overall input 
performance. For example: 
25 - if a person is only using speech, when both speech and gesture recognition is 

available, then the bandwidth moderator allocates less bandwidth resource to 

gesture recognition; 

if speech recognition is found to be poor (a low confidence score is measured) 
increasing the data generated in the lower speech-modality task entities and 
30 allocating more bandwidth for passing on this data may well result in overall input 

performance gains outweighing any loss in gesture recognition capability resulting 
from the reduced data flow in the gesture modality processing stack. 
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In the present embodiment bandwidth allocation is effected by the moderator by controlling 
the amount of data output by the task entities that use the bandwidth-limited 
communication channel. How this is done depends on the type of task being carried out by 
5 each entity. For example, where the task entities concerned are sensors, the sampling rates 
of the sensors can be changed relative to each other to favour one modality over the other 
as required by the bandwidth moderator. If the task entities being controlled effect feature 
extraction then the bandwidth moderator can be arranged to control the number of features 
extracted for each modality. Similarly, if the task entities controlled by the bandwidth 
1 0 moderator effect syntactic and semantic analysis, then the depth and breath of the word or 
phoneme lattices can be controlled. 

Whilst generally the task entities using the communications channel will be at the same 
level in the processing stacks of each modality, this is not necessarily the case as the 

15 moderator can be arranged to understand how to control different types of task entity to 
effect the desired bandwidth allocation. Furthermore, it will be appreciated that the 
bandwidth moderator can be arranged to allocate bandwidth between more than two 
modalities. Again, whilst the resource allocated by the moderator in the Figure 2 example 
is channel bandwidth, the moderator can be used to allocate other limited resources 

20 between modalities such as processing power and/or memory. 

Figure 3 illustrates an arrangement in which both the feature-extraction task entities of two 
modalities share a first communication channel to respective symbol-recognition task 
entities, and the syntactic-analysis task entities of these modalities share a second 
25 communication channel to respective semantic-analysis task entities. 

The allocation of the bandwidth of the first communication channel between the two 
feature-extraction task entities is controlled by a first bandwidth moderator whilst the 
allocation of the bandwidth of the second communication channel between the two 
3 0 syntactic- analysis task entities is controlled by a second bandwidth moderator. It would be 
possible simply to have the first and second bandwidth moderators work independently, 
each operating as described for the moderator of Figure 2. However, instead provision is 
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made for global coordination of the two moderators by a third, global, moderator. The role 
of the global moderator is to guide the first and second moderators in making their 
allocations. For example, the global moderator may determine that whilst the first 
moderator should favour the speech feature-extraction task entity over the gesture feature- 
5 extraction task entity, the second moderator should be more even-handed between the 
syntactic-analysis task entities of the two modalities. The first and second moderators make 
their final allocations taking into account respective local activity in the task entities they 
control; the first and second moderators may also take account of the allocations made by 
each other. 

10 

Of course, a single, global, moderator could be used to directly control allocation of 
bandwidth for both the first and second channels without the use of the local first and 
second moderators described above. 

15 Instead of there being two separate communication channels at respective levels in the 
processing stacks of the two modalities, it may be that only a single channel is available 
both for communication between the feature-extraction task entities and the symbol- 
recognition task entities and for communication between the syntactic-analysis task entities 
and the semantic-analysis task entities. In this case, the general configuration of moderators 

20 . shown in Figure 3 can still be employed with the global moderator now determining, for 
example, the allocation of bandwidth between the two processing-stack levels involved and 
the first and second moderators then each effecting a subordinate allocation at a respective 
one of these levels. An alternative arrangement of moderators is depicted in Figure 4 where 
the global moderator determines allocation between modalities and each modality has an 

25 associated moderator that effects a subordinate allocation between the two concerned 
levels of the processing stack handling the modality. 

It will be appreciated that many variants are possible to the above described embodiments 
30 of the invention. 
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With regard to the location of the moderators themselves, these can be located locally or 
remote from the task entities they control.. However, at least notionally, the resource 
moderators can be considered as part of the modality manager of the device. It may be 
noted that a resource moderator can be arranged to restrict resource access to sero for a 
5 particular moderator in appropriate circumstances, thereby effectively eliminating that 
modality; preferably, however, the presence or absence of any particular modality is 
determined by higher-level functionality of the modality manager and the resource 
managers are arranged always to provide at least a minimum resource level to each 
modality that the higher-level functionality of the modality manager has decided should be 
10 present. 



Whilst the particular task entity instances used in each modality processing stack can be 
predetermined or can be constituted by an ad hoc collection of available instances under 
1 5 the control of the modality manager, it is also possible to arrange for some or all of these 
entity instances to be predetermined (where all task entity instances are predetermined, the 
modality manager is not involved in organizing task entities to form viable modality 
processing stacks). 

20 Although in the above described embodiments resource allocation is effected by 
controlling operation of the task entities to vary their usage of the resource, it will be 
appreciated that allocation can effected in other ways such as by limiting data delivery to 
the resource from the task entity subject of regulation either by queuing the data or by 
selective culling of that data. 



25 
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CLAIMS 

1. A method for dynamically allocating a resource used by task entities respectively 
5 involved in processing different input modalities, wherein the resource is dynamically 

allocated between the task entities according to one or more of the following: 
actual usage of the different modalities by a user; 
confidence in the results of processing of each of the modalities; 
pragmatic information on mode usage. 

10 

2. A method according to claim 1, wherein the resource is communication bandwidth. 

3. A method according to claim 1, wherein the resource is processing power. 
15 4. A method according to claim 1, wherein the resource is memory. 

5. A method according to any one of the preceding claims applied to each of two separate 
resources each used by different respective entities of said different input modalities, the 
allocation of the two resources being independent of each other. 

20 

6. A method according to any one of claims 1 to 4 applied to each of two separate 
resources each used by different respective entities of said different input modalities, the 
allocation of the two resources being jointly controlled. 

25 7. A method according to any one of claims 1 to 4 wherein said resource is used by 
multiple task entities for each modality, the resource being first allocated between 
modalities and then between task entities in the same modality. 

8. A method according to any one of claims 1 to 4 wherein said resource is used by 
30 multiple task entities for each modality, the resource being first allocated between different 
groups of equivalent task entities of different modalities and then between task entities of 
the same group. 
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9. A method according to any one of the preceding claims, wherein resource allocation is 
effected by controlling operation of the task entities to vary their usage of the resource. 

5 10- A bandwidth allocation method comprising the steps of: 

(a) establish a connection between an adhoc collection of co-operating devices and a 
network based service or series of services, 

(b) allot available bandwidth to each modality based on a prior knowledge of 
requirements and usage, 

10 (c) monitor mode usage and confidence in allotted recognition task, 

(d) monitor mode usage and confidence in recognition across all tasks, 

(e) assess pragmatic and application usage of modes, 

(f) collectively moderate bandwidth allocation between devices and network services 
to favour modes with high activity or poor performance. 
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ABSTRACT 

Dynamic Resource Allocation in a Multimodal System 

5 

A limited resource, such as communication bandwidth or processing power, is dynamically 
allocated between task entities that are respectively involved in processing different input 
modalities. The allocation is effected in dependence on one or more of the actual usage of 
the different modalities by a user, the confidence in the results of processing of each of the 
10 modalities, and pragmatic information on mode usage. p 
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